Column

Let high dim space be a euclidean space, \(\mathbb{R}^p,~~p>3\)
where \(p\) are numeric variables
 

 

nspyrison.github.io/user2018
Suggested viewing:
125% zoom from Firefox or Chrome

Column

spinifex

Data - flea

74 obs x 6 var of physical measurements taken across 3 different species of flea-beetles. Methods are unsupervized, but data are colored according to species.

tourr

comparison

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

  • Principal component analysis (PCA): \(p\) ordered linear combinations of \(p\) dimensions. Plot PC1 vs PC2
  • t-distributed Stochastic Neighbor Embedding (t-SNE): \(p\) unordered non-linear combinations of \(p\) dimensions. Do PCA and Plot PC1 vs PC2
  • Tour (Holes optimized): stochastic gradient opmitization of white space in the middle of projections (from \(p\) down to 2 dimensions)
Method Interpretable Max_Var_Retention Global_Optimia Cannot_Overfit For_Nonlinear_Data
PCA TRUE FALSE TRUE TRUE FALSE
t-SNE FALSE NA FALSE FALSE TRUE
Tour, holes TRUE TRUE FALSE TRUE FALSE

f.pca <- stats::prcomp(flea)
ggplot2::ggplot(f.pca) + ...

f.tsne <- Rtsne(f, ...)
f.tsne.pca <- stats::prcomp(f.tsne)
ggplot2::ggplot(f.tsne.pca) + ...

f.holes_end <- tourr::animate_xy(flea, guided_tour(index = holes))
ggplot2::ggplot(f.holes_end) + ...

Variation lost from dimension reduction

contact

Nick S Spyrison
nicholas.spyrison@monash.edu
github.com/nspyrison
scholar.google.com.au/  

Install

devtools::install_github("nspyrison/spinifex")
library(spinifex)

?spinifex::proj_data
?spinifex::slideshow

 

Thanks

  • Prof. Dianne Cook - Guidance, namesake, and contributions to projection pursuit
  • Dr. Ursula Laa - Collaboration, use cases, and dev feedback  

References

  1. H. Wickham, D. Cook, H. Hofmann, and A. Buja (2011). tourr: An r package for exploring multivariate data with projections. Journal of Statistical Software 40(2), http://www.jstatsoft.org/v40.
  2. D. Asimov (1985). The grand tour: a tool for viewing multidimensional data. SIAM Journal on Scientific and Statistical Computing, 6(1), 128–143.
  3. D. Cook, & A. Buja (1997). Manual Controls for High-Dimensional Data Projections. Journal of Computational and Graphical Statistics, 6(4), 464–480. https://doi.org/10.2307/1390747
  4. H. Wickham, D. Cook, and H. Hofmann (2015). Visualising statistical models: Removing the blindfold (withdiscussion). Statistical Analysis and Data Mining 8(4), 203–225.  

more

devtools::install_github("nspyrison/tourr")
library(tourr)

?tourr::animate_groupxy()
?tourr::animate_density2d()

animate_groupxy()

animate_density2d()